IE 360 Project - Group 12

Mustafa Said Kesici, Hamza Pamukçu, İbrahim Bülbül

Introduction

Problem Description

The primary objective of this project is to predict hourly solar power prediction for the Edikli GES (Güneş Enerjisi Santrali) solar power plant located in Niğde. The forecasting period spans from May 14 to June 4, with 24-hour predictions generated for each day. The forecasting model utilizes production data up to two days before the target date, ensuring that the data is refreshed daily within this time frame to enhance prediction accuracy.

Data Description

The data utilized in this project comprises two main components: weather data and solar power production data. The weather data includes variables such as downward shortwave radiation flux (dswrf_surface), cloud cover at various atmospheric levels (tcdc_low.cloud.layer, tcdc_middle.cloud.layer, tcdc_high.cloud.layer), and temperature at the surface (tmp_surface). This weather data is recorded hourly and provides crucial information on the environmental conditions affecting solar power production.

The solar power production data includes the hourly production values recorded at the Edikli GES plant. Both datasets are merged using a common datetime index, ensuring that each production record is associated with the corresponding weather conditions. This combined dataset is essential for developing accurate predictive models, as it allows for the analysis of how weather variables influence solar power output.

Summary of the Proposed Approach

Our approach involves using a combination of weather variables and historical production data to build predictive models. We start with data preprocessing to clean and organize the data, followed by exploratory data analysis to identify key patterns and relationships. We then build several linear regression models, gradually adding more variables to improve the accuracy of our predictions.

require(data.table)
## Loading required package: data.table
require(lubridate)
## Loading required package: lubridate
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:data.table':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
require(forecast)
## Loading required package: forecast
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
require(skimr)
## Loading required package: skimr
require(repr)
## Loading required package: repr
require(openxlsx) #library(openxlsx)
## Loading required package: openxlsx
require(ggplot2)
## Loading required package: ggplot2
require(data.table)
require(skimr)
require(GGally)
## Loading required package: GGally
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
require(ggcorrplot)
## Loading required package: ggcorrplot
require(forecast)

library(data.table)
library(lubridate)
library(forecast)
library(skimr)
library(repr)
library(readxl)

These libraries are essential for data manipulation, time series analysis, visualization, and handling date-time operations.

todays_date=Sys.Date()
forecast_date=todays_date+1


options(repr.plot.width=12.7, repr.plot.height=8.5)

data_path2='/Users/kesici/Downloads/processed_weather.csv'

weather_info=fread(data_path2)

weather_info[,datetime:=ymd(date)+dhours(hour)]
weather_info=weather_info[order(datetime)]

head(weather_info,25)
##           date  hour   lat   lon dswrf_surface tcdc_low.cloud.layer
##         <IDat> <int> <num> <num>         <num>                <num>
##  1: 2022-01-01     4 38.00 35.00             0                  0.2
##  2: 2022-01-01     4 38.50 35.25             0                  1.6
##  3: 2022-01-01     4 37.75 34.75             0                  4.4
##  4: 2022-01-01     4 38.75 34.50             0                  5.0
##  5: 2022-01-01     4 37.75 34.50             0                  0.0
##  6: 2022-01-01     4 38.25 34.75             0                  0.0
##  7: 2022-01-01     4 38.75 35.00             0                  5.0
##  8: 2022-01-01     4 38.50 35.00             0                  1.7
##  9: 2022-01-01     4 38.25 34.50             0                  5.0
## 10: 2022-01-01     4 37.75 35.00             0                  1.7
## 11: 2022-01-01     4 38.00 34.50             0                  0.0
## 12: 2022-01-01     4 38.50 34.50             0                  2.9
## 13: 2022-01-01     4 37.75 35.25             0                  1.0
## 14: 2022-01-01     4 37.75 35.50             0                  3.5
## 15: 2022-01-01     4 38.00 34.75             0                  3.0
## 16: 2022-01-01     4 38.75 35.50             0                  1.4
## 17: 2022-01-01     4 38.25 35.00             0                  0.0
## 18: 2022-01-01     4 38.25 35.50             0                  5.0
## 19: 2022-01-01     4 38.75 35.25             0                  2.1
## 20: 2022-01-01     4 38.00 35.25             0                  0.0
## 21: 2022-01-01     4 38.25 35.25             0                  4.0
## 22: 2022-01-01     4 38.50 35.50             0                  0.0
## 23: 2022-01-01     4 38.00 35.50             0                  4.1
## 24: 2022-01-01     4 38.75 34.75             0                  5.0
## 25: 2022-01-01     4 38.50 34.75             0                  3.0
##           date  hour   lat   lon dswrf_surface tcdc_low.cloud.layer
##     tcdc_middle.cloud.layer tcdc_high.cloud.layer tcdc_entire.atmosphere
##                       <num>                 <num>                  <num>
##  1:                     5.0                   2.1                    8.2
##  2:                     0.0                   1.6                    3.3
##  3:                    21.8                   6.9                   32.7
##  4:                     0.0                   5.0                   14.7
##  5:                    36.1                   5.8                   41.4
##  6:                     0.0                   7.5                    9.1
##  7:                     0.9                   9.7                   18.3
##  8:                     0.0                   5.0                    8.8
##  9:                     0.0                   5.0                   13.2
## 10:                    25.1                   5.0                   32.3
## 11:                     5.0                   7.6                   14.0
## 12:                     0.0                   5.0                   12.5
## 13:                    13.9                   5.0                   21.7
## 14:                    19.0                   5.0                   28.0
## 15:                     5.0                   7.2                   15.1
## 16:                     0.1                   5.0                    6.7
## 17:                     0.0                   1.7                    1.7
## 18:                     0.0                   0.0                    5.2
## 19:                     1.4                   5.9                   10.6
## 20:                     5.1                   4.1                   10.6
## 21:                     0.0                   0.0                    4.0
## 22:                     0.0                   0.0                    0.0
## 23:                     9.5                   5.0                   18.2
## 24:                     0.7                   5.0                   15.7
## 25:                     0.0                   5.0                   11.4
##     tcdc_middle.cloud.layer tcdc_high.cloud.layer tcdc_entire.atmosphere
##     uswrf_top_of_atmosphere csnow_surface dlwrf_surface uswrf_surface
##                       <num>         <int>         <num>         <num>
##  1:                       0             0       219.279             0
##  2:                       0             0       227.479             0
##  3:                       0             0       227.179             0
##  4:                       0             0       241.779             0
##  5:                       0             0       241.879             0
##  6:                       0             0       230.579             0
##  7:                       0             0       236.379             0
##  8:                       0             0       228.379             0
##  9:                       0             0       228.079             0
## 10:                       0             0       217.179             0
## 11:                       0             0       226.179             0
## 12:                       0             0       234.379             0
## 13:                       0             0       214.779             0
## 14:                       0             0       235.679             0
## 15:                       0             0       225.079             0
## 16:                       0             0       232.479             0
## 17:                       0             0       230.679             0
## 18:                       0             0       222.979             0
## 19:                       0             0       234.279             0
## 20:                       0             0       209.479             0
## 21:                       0             0       232.879             0
## 22:                       0             0       211.779             0
## 23:                       0             0       221.879             0
## 24:                       0             0       239.679             0
## 25:                       0             0       229.579             0
##     uswrf_top_of_atmosphere csnow_surface dlwrf_surface uswrf_surface
##     tmp_surface            datetime
##           <num>              <POSc>
##  1:     268.804 2022-01-01 04:00:00
##  2:     271.204 2022-01-01 04:00:00
##  3:     268.304 2022-01-01 04:00:00
##  4:     271.404 2022-01-01 04:00:00
##  5:     272.504 2022-01-01 04:00:00
##  6:     271.204 2022-01-01 04:00:00
##  7:     270.904 2022-01-01 04:00:00
##  8:     270.504 2022-01-01 04:00:00
##  9:     270.104 2022-01-01 04:00:00
## 10:     265.604 2022-01-01 04:00:00
## 11:     268.204 2022-01-01 04:00:00
## 12:     271.404 2022-01-01 04:00:00
## 13:     264.704 2022-01-01 04:00:00
## 14:     269.304 2022-01-01 04:00:00
## 15:     269.004 2022-01-01 04:00:00
## 16:     271.204 2022-01-01 04:00:00
## 17:     271.204 2022-01-01 04:00:00
## 18:     268.304 2022-01-01 04:00:00
## 19:     270.304 2022-01-01 04:00:00
## 20:     265.004 2022-01-01 04:00:00
## 21:     271.304 2022-01-01 04:00:00
## 22:     262.204 2022-01-01 04:00:00
## 23:     265.604 2022-01-01 04:00:00
## 24:     271.404 2022-01-01 04:00:00
## 25:     270.804 2022-01-01 04:00:00
##     tmp_surface            datetime
data_path='/Users/kesici/Downloads/production 2.csv'
production=fread(data_path)
production[,datetime:=ymd(date)+dhours(hour)]
production=production[order(datetime)]


head(production,25)
##           date  hour production            datetime
##         <IDat> <int>      <num>              <POSc>
##  1: 2022-01-01     0       0.00 2022-01-01 00:00:00
##  2: 2022-01-01     1       0.00 2022-01-01 01:00:00
##  3: 2022-01-01     2       0.00 2022-01-01 02:00:00
##  4: 2022-01-01     3       0.00 2022-01-01 03:00:00
##  5: 2022-01-01     4       0.00 2022-01-01 04:00:00
##  6: 2022-01-01     5       0.00 2022-01-01 05:00:00
##  7: 2022-01-01     6       0.00 2022-01-01 06:00:00
##  8: 2022-01-01     7       0.00 2022-01-01 07:00:00
##  9: 2022-01-01     8       3.40 2022-01-01 08:00:00
## 10: 2022-01-01     9       6.80 2022-01-01 09:00:00
## 11: 2022-01-01    10       9.38 2022-01-01 10:00:00
## 12: 2022-01-01    11       7.65 2022-01-01 11:00:00
## 13: 2022-01-01    12       6.80 2022-01-01 12:00:00
## 14: 2022-01-01    13       5.10 2022-01-01 13:00:00
## 15: 2022-01-01    14       5.10 2022-01-01 14:00:00
## 16: 2022-01-01    15       1.70 2022-01-01 15:00:00
## 17: 2022-01-01    16       0.00 2022-01-01 16:00:00
## 18: 2022-01-01    17       0.00 2022-01-01 17:00:00
## 19: 2022-01-01    18       0.00 2022-01-01 18:00:00
## 20: 2022-01-01    19       0.00 2022-01-01 19:00:00
## 21: 2022-01-01    20       0.00 2022-01-01 20:00:00
## 22: 2022-01-01    21       0.00 2022-01-01 21:00:00
## 23: 2022-01-01    22       0.00 2022-01-01 22:00:00
## 24: 2022-01-01    23       0.00 2022-01-01 23:00:00
## 25: 2022-01-02     0       0.00 2022-01-02 00:00:00
##           date  hour production            datetime
str(production)
## Classes 'data.table' and 'data.frame':   21000 obs. of  4 variables:
##  $ date      : IDate, format: "2022-01-01" "2022-01-01" ...
##  $ hour      : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ production: num  0 0 0 0 0 0 0 0 3.4 6.8 ...
##  $ datetime  : POSIXct, format: "2022-01-01 00:00:00" "2022-01-01 01:00:00" ...
##  - attr(*, ".internal.selfref")=<externalptr>

After loading the required libraries, the code sets the current date and the forecast date for generating predictions. It adjusts the plot dimensions for better visualization. The weather data is read from a CSV file, and a new datetime column is created by combining the date and hour columns. This data is then sorted by the datetime column. Similarly, the solar power production data is read from another CSV file, a datetime column is created, and the data is sorted accordingly. Displaying the first few rows of both datasets and examining their structure ensures that the data is correctly formatted and ready for further analysis.

hourly_series=weather_info[,list(dswrf_surface=sum(dswrf_surface)/25,tcdc_low.cloud.layer=sum(tcdc_low.cloud.layer)/25,tcdc_middle.cloud.layer=sum(tcdc_middle.cloud.layer)/25,tcdc_high.cloud.layer=sum(tcdc_high.cloud.layer)/25,tcdc_entire.atmosphere=sum(tcdc_entire.atmosphere)/25,uswrf_top_of_atmosphere=sum(uswrf_top_of_atmosphere)/25,csnow_surface=sum(csnow_surface)/25,dlwrf_surface=sum(dlwrf_surface)/25,swrf_surface=sum(uswrf_surface)/25,tmp_surface=sum(tmp_surface)/25),list(date,hour)]

hourly_series[,datetime:=ymd(date)+dhours(hour)]
head(hourly_series)
##          date  hour dswrf_surface tcdc_low.cloud.layer tcdc_middle.cloud.layer
##        <IDat> <int>         <num>                <num>                   <num>
## 1: 2022-01-01     4        0.0000                2.384                   5.944
## 2: 2022-01-01     5        0.0000                2.784                   4.324
## 3: 2022-01-01     6        0.0000                2.964                   5.372
## 4: 2022-01-01     7        0.0000                3.284                   9.212
## 5: 2022-01-01     8        0.0000                3.672                  11.252
## 6: 2022-01-01     9        7.3688                4.120                  10.880
##    tcdc_high.cloud.layer tcdc_entire.atmosphere uswrf_top_of_atmosphere
##                    <num>                  <num>                   <num>
## 1:                 4.604                 14.296                 0.00000
## 2:                10.636                 19.272                 0.00000
## 3:                11.688                 21.772                 0.00000
## 4:                20.736                 31.992                 0.00000
## 5:                26.432                 38.376                 0.00000
## 6:                35.088                 45.856                 8.96704
##    csnow_surface dlwrf_surface swrf_surface tmp_surface            datetime
##            <num>         <num>        <num>       <num>              <POSc>
## 1:             0       227.999       0.0000     269.220 2022-01-01 04:00:00
## 2:             0       227.774       0.0000     269.104 2022-01-01 05:00:00
## 3:             0       227.764       0.0000     269.035 2022-01-01 06:00:00
## 4:             0       228.196       0.0000     269.001 2022-01-01 07:00:00
## 5:             0       228.657       0.0000     269.002 2022-01-01 08:00:00
## 6:             0       229.416       2.4128     271.634 2022-01-01 09:00:00

The provided code aggregates the weather data to create an hourly summary by averaging values over 25 grid points.

mergeddata<-merge(hourly_series,production,by="datetime",all.x=T)
head(mergeddata)
## Key: <datetime>
##               datetime     date.x hour.x dswrf_surface tcdc_low.cloud.layer
##                 <POSc>     <IDat>  <int>         <num>                <num>
## 1: 2022-01-01 04:00:00 2022-01-01      4        0.0000                2.384
## 2: 2022-01-01 05:00:00 2022-01-01      5        0.0000                2.784
## 3: 2022-01-01 06:00:00 2022-01-01      6        0.0000                2.964
## 4: 2022-01-01 07:00:00 2022-01-01      7        0.0000                3.284
## 5: 2022-01-01 08:00:00 2022-01-01      8        0.0000                3.672
## 6: 2022-01-01 09:00:00 2022-01-01      9        7.3688                4.120
##    tcdc_middle.cloud.layer tcdc_high.cloud.layer tcdc_entire.atmosphere
##                      <num>                 <num>                  <num>
## 1:                   5.944                 4.604                 14.296
## 2:                   4.324                10.636                 19.272
## 3:                   5.372                11.688                 21.772
## 4:                   9.212                20.736                 31.992
## 5:                  11.252                26.432                 38.376
## 6:                  10.880                35.088                 45.856
##    uswrf_top_of_atmosphere csnow_surface dlwrf_surface swrf_surface tmp_surface
##                      <num>         <num>         <num>        <num>       <num>
## 1:                 0.00000             0       227.999       0.0000     269.220
## 2:                 0.00000             0       227.774       0.0000     269.104
## 3:                 0.00000             0       227.764       0.0000     269.035
## 4:                 0.00000             0       228.196       0.0000     269.001
## 5:                 0.00000             0       228.657       0.0000     269.002
## 6:                 8.96704             0       229.416       2.4128     271.634
##        date.y hour.y production
##        <IDat>  <int>      <num>
## 1: 2022-01-01      4        0.0
## 2: 2022-01-01      5        0.0
## 3: 2022-01-01      6        0.0
## 4: 2022-01-01      7        0.0
## 5: 2022-01-01      8        3.4
## 6: 2022-01-01      9        6.8
newdata=mergeddata
newdata=newdata[,-c("date.y")]
newdata=newdata[,-c("hour.y")]
basedata=newdata[,-c("date.x")]
basedata=basedata[,-c("hour.x")]
basedata=basedata[,-c("datetime")]
head(newdata)
## Key: <datetime>
##               datetime     date.x hour.x dswrf_surface tcdc_low.cloud.layer
##                 <POSc>     <IDat>  <int>         <num>                <num>
## 1: 2022-01-01 04:00:00 2022-01-01      4        0.0000                2.384
## 2: 2022-01-01 05:00:00 2022-01-01      5        0.0000                2.784
## 3: 2022-01-01 06:00:00 2022-01-01      6        0.0000                2.964
## 4: 2022-01-01 07:00:00 2022-01-01      7        0.0000                3.284
## 5: 2022-01-01 08:00:00 2022-01-01      8        0.0000                3.672
## 6: 2022-01-01 09:00:00 2022-01-01      9        7.3688                4.120
##    tcdc_middle.cloud.layer tcdc_high.cloud.layer tcdc_entire.atmosphere
##                      <num>                 <num>                  <num>
## 1:                   5.944                 4.604                 14.296
## 2:                   4.324                10.636                 19.272
## 3:                   5.372                11.688                 21.772
## 4:                   9.212                20.736                 31.992
## 5:                  11.252                26.432                 38.376
## 6:                  10.880                35.088                 45.856
##    uswrf_top_of_atmosphere csnow_surface dlwrf_surface swrf_surface tmp_surface
##                      <num>         <num>         <num>        <num>       <num>
## 1:                 0.00000             0       227.999       0.0000     269.220
## 2:                 0.00000             0       227.774       0.0000     269.104
## 3:                 0.00000             0       227.764       0.0000     269.035
## 4:                 0.00000             0       228.196       0.0000     269.001
## 5:                 0.00000             0       228.657       0.0000     269.002
## 6:                 8.96704             0       229.416       2.4128     271.634
##    production
##         <num>
## 1:        0.0
## 2:        0.0
## 3:        0.0
## 4:        0.0
## 5:        3.4
## 6:        6.8
head(basedata)
##    dswrf_surface tcdc_low.cloud.layer tcdc_middle.cloud.layer
##            <num>                <num>                   <num>
## 1:        0.0000                2.384                   5.944
## 2:        0.0000                2.784                   4.324
## 3:        0.0000                2.964                   5.372
## 4:        0.0000                3.284                   9.212
## 5:        0.0000                3.672                  11.252
## 6:        7.3688                4.120                  10.880
##    tcdc_high.cloud.layer tcdc_entire.atmosphere uswrf_top_of_atmosphere
##                    <num>                  <num>                   <num>
## 1:                 4.604                 14.296                 0.00000
## 2:                10.636                 19.272                 0.00000
## 3:                11.688                 21.772                 0.00000
## 4:                20.736                 31.992                 0.00000
## 5:                26.432                 38.376                 0.00000
## 6:                35.088                 45.856                 8.96704
##    csnow_surface dlwrf_surface swrf_surface tmp_surface production
##            <num>         <num>        <num>       <num>      <num>
## 1:             0       227.999       0.0000     269.220        0.0
## 2:             0       227.774       0.0000     269.104        0.0
## 3:             0       227.764       0.0000     269.035        0.0
## 4:             0       228.196       0.0000     269.001        0.0
## 5:             0       228.657       0.0000     269.002        3.4
## 6:             0       229.416       2.4128     271.634        6.8

The provided code merges the aggregated weather data with the production data to create a comprehensive dataset for analysis.

Next, unnecessary columns resulting from the merge are removed such as hour and date. Then the first few rows of ‘newdata’ and ‘basedata’ are displayed to confirm the column removals and the final structure of the datasets

ggpairs(basedata)
## Warning: Removed 1 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 119 rows containing missing values
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 5 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 120 rows containing missing values
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 5 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 120 rows containing missing values
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 5 rows containing missing values (`geom_point()`).
## Removed 5 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 6 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 6 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 5 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 122 rows containing missing values
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 6 rows containing missing values (`geom_point()`).
## Warning: Removed 3 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 5 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 4 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 121 rows containing missing values
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 6 rows containing missing values (`geom_point()`).
## Warning: Removed 5 rows containing missing values (`geom_point()`).
## Warning: Removed 3 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 3 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 121 rows containing missing values
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 119 rows containing missing values
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removing 1 row that contained a missing value
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 119 rows containing missing values
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 5 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 120 rows containing missing values
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing non-finite values (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 119 rows containing missing values
## Warning: Removed 119 rows containing missing values (`geom_point()`).
## Warning: Removed 120 rows containing missing values (`geom_point()`).
## Removed 120 rows containing missing values (`geom_point()`).
## Warning: Removed 122 rows containing missing values (`geom_point()`).
## Warning: Removed 121 rows containing missing values (`geom_point()`).
## Removed 121 rows containing missing values (`geom_point()`).
## Warning: Removed 119 rows containing missing values (`geom_point()`).
## Removed 119 rows containing missing values (`geom_point()`).
## Warning: Removed 120 rows containing missing values (`geom_point()`).
## Warning: Removed 119 rows containing missing values (`geom_point()`).
## Warning: Removed 118 rows containing non-finite values (`stat_density()`).

basedata=basedata[,-c("csnow_surface")]
basedata=basedata[,-c("hour")]
## Warning: column(s) not removed because not found: [hour]
basedata=basedata[,-c("datetime")]
## Warning: column(s) not removed because not found: [datetime]
head(basedata,25)
##     dswrf_surface tcdc_low.cloud.layer tcdc_middle.cloud.layer
##             <num>                <num>                   <num>
##  1:        0.0000                2.384                   5.944
##  2:        0.0000                2.784                   4.324
##  3:        0.0000                2.964                   5.372
##  4:        0.0000                3.284                   9.212
##  5:        0.0000                3.672                  11.252
##  6:        7.3688                4.120                  10.880
##  7:      180.1384                4.180                  13.748
##  8:      254.7928                3.972                  16.520
##  9:      312.5520                4.408                  23.468
## 10:      347.4144                5.348                  32.996
## 11:      364.6544                6.772                  39.460
## 12:      362.2984                9.116                  43.452
## 13:      221.4584               37.100                  82.296
## 14:      157.6448               37.032                  78.252
## 15:      107.3080               35.876                  74.752
## 16:       80.4792               35.080                  72.480
## 17:       64.3864               36.096                  75.236
## 18:       53.6528               37.028                  77.632
## 19:        0.0000               49.732                  90.688
## 20:        0.0000               56.532                  92.580
## 21:        0.0000               66.812                  94.140
## 22:        0.0000               71.096                  94.212
## 23:        0.0000               74.580                  94.000
## 24:        0.0000               77.660                  94.556
## 25:        0.0000               93.816                  91.588
##     dswrf_surface tcdc_low.cloud.layer tcdc_middle.cloud.layer
##     tcdc_high.cloud.layer tcdc_entire.atmosphere uswrf_top_of_atmosphere
##                     <num>                  <num>                   <num>
##  1:                 4.604                 14.296                 0.00000
##  2:                10.636                 19.272                 0.00000
##  3:                11.688                 21.772                 0.00000
##  4:                20.736                 31.992                 0.00000
##  5:                26.432                 38.376                 0.00000
##  6:                35.088                 45.856                 8.96704
##  7:                80.764                 85.364               135.20448
##  8:                69.392                 75.468               152.98944
##  9:                60.516                 70.368               167.55392
## 10:                54.852                 70.452               180.69312
## 11:                45.880                 71.136               187.03872
## 12:                39.864                 71.612               188.88640
## 13:                12.680                 89.116               168.56448
## 14:                17.376                 85.688               133.18080
## 15:                27.016                 83.300                92.25216
## 16:                37.268                 83.508                69.18720
## 17:                44.544                 85.644                55.34976
## 18:                49.756                 87.520                46.12352
## 19:                68.948                 98.276                 0.00000
## 20:                69.140                 98.380                 0.00000
## 21:                76.844                 98.760                 0.00000
## 22:                80.404                 98.492                 0.00000
## 23:                84.192                 98.792                 0.00000
## 24:                86.644                 98.980                 0.00000
## 25:                99.228                100.000                 0.00000
##     tcdc_high.cloud.layer tcdc_entire.atmosphere uswrf_top_of_atmosphere
##     dlwrf_surface swrf_surface tmp_surface production
##             <num>        <num>       <num>      <num>
##  1:       227.999      0.00000     269.220       0.00
##  2:       227.774      0.00000     269.104       0.00
##  3:       227.764      0.00000     269.035       0.00
##  4:       228.196      0.00000     269.001       0.00
##  5:       228.657      0.00000     269.002       3.40
##  6:       229.416      2.41280     271.634       6.80
##  7:       237.291     59.18848     275.786       9.38
##  8:       237.387     82.53120     278.553       7.65
##  9:       238.387     99.73056     280.215       6.80
## 10:       240.887    109.43168     280.609       5.10
## 11:       243.015    113.51040     280.277       5.10
## 12:       245.296    112.10304     279.165       1.70
## 13:       260.756     66.06976     277.059       0.00
## 14:       258.960     47.22432     273.467       0.00
## 15:       257.256     32.11968     271.659       0.00
## 16:       256.506     24.08960     271.537       0.00
## 17:       257.705     19.27104     271.715       0.00
## 18:       259.485     16.06016     271.855       0.00
## 19:       272.875      0.00000     272.029       0.00
## 20:       274.722      0.00000     272.231       0.00
## 21:       280.774      0.00000     272.513       0.00
## 22:       283.754      0.00000     272.616       0.00
## 23:       287.346      0.00000     273.024       0.00
## 24:       291.060      0.00000     273.426       0.00
## 25:       312.804      0.00000     273.605       0.00
##     dlwrf_surface swrf_surface tmp_surface production
corr<-round(cor(basedata),1)

The provided code conducts exploratory data analysis by visualizing pairwise relationships and computing correlations between variables in the dataset.

Next, unnecessary columns are removed from basedata such as csnow_surface, hour and datetime because they are not needed for the correlation analysis

Finally, we calculate the correlation matrix for the remaining variables in ‘basedata’.

daily_series=newdata[,list(total=sum(production)),by=list(date.x)]
ggplot(daily_series, aes(date.x,total, group=1)) + geom_line() +geom_point()
## Warning: Removed 5 rows containing missing values (`geom_line()`).
## Warning: Removed 5 rows containing missing values (`geom_point()`).

a=newdata[!is.na(production)]
acf(a$production)

newdata
## Key: <datetime>
##                   datetime     date.x hour.x dswrf_surface tcdc_low.cloud.layer
##                     <POSc>     <IDat>  <int>         <num>                <num>
##     1: 2022-01-01 04:00:00 2022-01-01      4        0.0000                2.384
##     2: 2022-01-01 05:00:00 2022-01-01      5        0.0000                2.784
##     3: 2022-01-01 06:00:00 2022-01-01      6        0.0000                2.964
##     4: 2022-01-01 07:00:00 2022-01-01      7        0.0000                3.284
##     5: 2022-01-01 08:00:00 2022-01-01      8        0.0000                3.672
##    ---                                                                         
## 21110: 2024-05-29 17:00:00 2024-05-29     17      557.5424               12.980
## 21111: 2024-05-29 18:00:00 2024-05-29     18      475.3048               12.872
## 21112: 2024-05-29 19:00:00 2024-05-29     19      394.9286               11.532
## 21113: 2024-05-29 20:00:00 2024-05-29     20      321.1014               10.816
## 21114: 2024-05-29 21:00:00 2024-05-29     21      267.5859               11.804
##        tcdc_middle.cloud.layer tcdc_high.cloud.layer tcdc_entire.atmosphere
##                          <num>                 <num>                  <num>
##     1:                   5.944                 4.604                 14.296
##     2:                   4.324                10.636                 19.272
##     3:                   5.372                11.688                 21.772
##     4:                   9.212                20.736                 31.992
##     5:                  11.252                26.432                 38.376
##    ---                                                                     
## 21110:                  39.776                 7.884                 48.748
## 21111:                  43.312                 8.256                 51.632
## 21112:                  43.768                 7.324                 51.172
## 21113:                  43.488                 6.524                 50.376
## 21114:                  42.588                 5.464                 48.780
##        uswrf_top_of_atmosphere csnow_surface dlwrf_surface swrf_surface
##                          <num>         <num>         <num>        <num>
##     1:                  0.0000             0       227.999      0.00000
##     2:                  0.0000             0       227.774      0.00000
##     3:                  0.0000             0       227.764      0.00000
##     4:                  0.0000             0       228.196      0.00000
##     5:                  0.0000             0       228.657      0.00000
##    ---                                                                 
## 21110:                261.6896             0       337.264    101.88800
## 21111:                242.1421             0       337.765     89.93280
## 21112:                214.1965             0       336.652     76.80448
## 21113:                179.0547             0       335.485     62.73920
## 21114:                149.2122             0       333.756     52.28160
##        tmp_surface production
##              <num>      <num>
##     1:     269.220        0.0
##     2:     269.104        0.0
##     3:     269.035        0.0
##     4:     269.001        0.0
##     5:     269.002        3.4
##    ---                       
## 21110:     297.253         NA
## 21111:     294.218         NA
## 21112:     291.334         NA
## 21113:     288.449         NA
## 21114:     287.744         NA

First, we create a daily aggregated series of production data, then a line plot is drawn to visualize the daily total production. After these steps, we plot the autocorrelation function (ACF) to identify patterns and periodicity in the time series data.

production1 <- ts(production$production, freq=365)
daily_ts_multip<-decompose(production1, type="additive")
plot(daily_ts_multip)

Seasonal and trend decomposition separates a time series into three components: trend, seasonal, and residual. The trend component captures the long-term direction, the seasonal component identifies regular repeating patterns, and the residual component represents random noise. This decomposition helps in better understanding and modeling the different factors influencing solar power production, improving the accuracy of our predictive models.

Approach

Our approach to forecasting hourly solar power production involves several key steps to prepare and analyze the data, followed by developing predictive models.

Firstly, we convert our cleaned and processed data into a data.table format for efficient manipulation. We then create several lagged variables of the production data, which capture the influence of past production values on current production. These lagged variables range from 1-hour to 96-hour intervals, providing a comprehensive temporal view of past production trends.

In addition to lagged production values, we generate categorical features to capture temporal patterns, such as the hour of the day, the season, and other date-related factors. For instance, we categorize the hour of the day and the quarter of the year (season). We also extract specific components from the datetime field, such as the hour, day, week, and month, to create features like saat, gun, hafta, and ay.

To incorporate weather effects, we calculate the maximum (tmax) and minimum (tmin) daily surface temperatures. We also introduce a trend variable to capture any underlying trends over the period of data collection.

Finally, we create lagged weather variables, such as lagged downward shortwave radiation flux (dswrf_surface), to account for delayed effects of weather conditions on production.

By enriching our dataset with these engineered features, we aim to capture a wide range of factors influencing solar power production, which forms the basis for our predictive modeling.

datapn<-data.table(newdata)
#head(datapn,15)

lag15<-shift(datapn$production, n=15L, fill=NA)
datapn$lag15<-lag15


lag48<-shift(datapn$production, n=48L, fill=NA)
datapn$lag48<-lag48

lag72<-shift(datapn$production, n=72L, fill=NA)
datapn$lag72<-lag72

lag96<-shift(datapn$production, n=96L, fill=NA)
datapn$lag96<-lag96

lag95<-shift(datapn$production, n=95L, fill=NA)
datapn$lag95<-lag95
lag95<-shift(datapn$production, n=95L, fill=NA)
datapn$lag95<-lag95

lag47<-shift(datapn$production, n=47L, fill=NA)
datapn$lag47<-lag47

lag71<-shift(datapn$production, n=71L, fill=NA)
datapn$lag71<-lag71

lag49<-shift(datapn$production, n=49L, fill=NA)
datapn$lag49<-lag49

lag73<-shift(datapn$production, n=73L, fill=NA)
datapn$lag73<-lag73

lag14<-shift(datapn$production, n=14L, fill=NA)
datapn$lag14<-lag14
lag13<-shift(datapn$production, n=13L, fill=NA)
datapn$lag13<-lag13
lag12<-shift(datapn$production, n=12L, fill=NA)
datapn$lag12<-lag12
lag11<-shift(datapn$production, n=11L, fill=NA)
datapn$lag11<-lag11
lag16<-shift(datapn$production, n=16L, fill=NA)
datapn$lag16<-lag16
lag24<-shift(datapn$production, n=24L, fill=NA)
datapn$lag24<-lag24
lag23<-shift(datapn$production, n=23L, fill=NA)
datapn$lag23<-lag23
lag25<-shift(datapn$production, n=25L, fill=NA)
datapn$lag25<-lag25
lag8<-shift(datapn$production, n=8L, fill=NA)
datapn$lag8<-lag8
lag6<-shift(datapn$production, n=6L, fill=NA)
datapn$lag6<-lag6
lag1<-shift(datapn$production, n=1L, fill=NA)
datapn$lag1<-lag1
lag2<-shift(datapn$production, n=2L, fill=NA)
datapn$lag2<-lag2
datapn$hoursoftheday<-as.factor(datapn$hour.x)
datapn$season<-as.factor(quarter(datapn$date.x))
datapn[,saat:=as.character(hour(datetime))]
datapn[,gun:=as.character(day(date.x))]
datapn[,hafta:=as.character(week(date.x))]
datapn[,ay:=as.character(month(date.x))]
datapn[,tmax:=max(tmp_surface),by=date.x]
datapn[,tmin:=min(tmp_surface),by=date.x]
trend<-c(1:nrow((datapn)))
datapn$trend<-trend
lag1dswrf<-shift(datapn$dswrf_surface, n=1L, fill=NA)
datapn$lag1dswrf<-lag1dswrf
lag12dswrf<-shift(datapn$dswrf_surface, n=12L, fill=NA)
datapn$lag12dswrf<-lag12dswrf

After preparing and enriching our dataset with lagged variables and categorical features, we proceed to develop and evaluate multiple linear regression models to predict solar power production. The process begins with simple models and progressively incorporates more variables to enhance predictive accuracy.

For example, the first model (lm0) is a simple linear regression where production is predicted solely based on the downward shortwave radiation flux (dswrf_surface).

We conduct summary and residual analysis to check the models performance for each model individually.

lm0<-lm(production~dswrf_surface,data = datapn)
summary(lm0)
## 
## Call:
## lm(formula = production ~ dswrf_surface, data = datapn)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.9182 -1.4676 -0.7549  1.2159  9.8480 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7.549e-01  2.579e-02   29.27   <2e-16 ***
## dswrf_surface 8.383e-03  7.419e-05  113.00   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.912 on 20993 degrees of freedom
##   (119 observations deleted due to missingness)
## Multiple R-squared:  0.3782, Adjusted R-squared:  0.3782 
## F-statistic: 1.277e+04 on 1 and 20993 DF,  p-value: < 2.2e-16
checkresiduals(lm0)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 15809, df = 10, p-value < 2.2e-16
#################################################


lm2<-lm(production~dswrf_surface+lag12,data = datapn)
summary(lm2)
## 
## Call:
## lm(formula = production ~ dswrf_surface + lag12, data = datapn)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.6608 -1.8337 -0.3909  1.1284  8.8525 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    1.834e+00  3.222e-02   56.92   <2e-16 ***
## dswrf_surface  6.819e-03  7.636e-05   89.31   <2e-16 ***
## lag12         -2.864e-01  5.602e-03  -51.12   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.746 on 20980 degrees of freedom
##   (131 observations deleted due to missingness)
## Multiple R-squared:  0.4472, Adjusted R-squared:  0.4472 
## F-statistic:  8487 on 2 and 20980 DF,  p-value: < 2.2e-16
checkresiduals(lm2)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 15261, df = 10, p-value < 2.2e-16
#############################################

lm3<-lm(production~dswrf_surface+lag12+tcdc_low.cloud.layer,data = datapn)
summary(lm3)
## 
## Call:
## lm(formula = production ~ dswrf_surface + lag12 + tcdc_low.cloud.layer, 
##     data = datapn)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.429 -1.865 -0.245  1.281  8.992 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           2.592e+00  3.832e-02   67.63   <2e-16 ***
## dswrf_surface         6.189e-03  7.652e-05   80.87   <2e-16 ***
## lag12                -3.367e-01  5.644e-03  -59.66   <2e-16 ***
## tcdc_low.cloud.layer -2.276e-02  6.620e-04  -34.38   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.672 on 20978 degrees of freedom
##   (132 observations deleted due to missingness)
## Multiple R-squared:  0.4767, Adjusted R-squared:  0.4766 
## F-statistic:  6370 on 3 and 20978 DF,  p-value: < 2.2e-16
checkresiduals(lm3)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 15040, df = 10, p-value < 2.2e-16
############################################

lm4<-lm(production~dswrf_surface+lag12+tcdc_low.cloud.layer+lag6+lag1,data = datapn)
summary(lm4)
## 
## Call:
## lm(formula = production ~ dswrf_surface + lag12 + tcdc_low.cloud.layer + 
##     lag6 + lag1, data = datapn)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.7170 -0.7270 -0.2142  0.6585  9.2641 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           1.249e+00  2.197e-02  56.847  < 2e-16 ***
## dswrf_surface        -4.525e-04  7.433e-05  -6.088 1.17e-09 ***
## lag12                -9.157e-02  3.135e-03 -29.208  < 2e-16 ***
## tcdc_low.cloud.layer -8.786e-03  3.555e-04 -24.715  < 2e-16 ***
## lag6                 -1.749e-01  3.509e-03 -49.842  < 2e-16 ***
## lag1                  8.942e-01  5.012e-03 178.406  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.396 on 20976 degrees of freedom
##   (132 observations deleted due to missingness)
## Multiple R-squared:  0.8573, Adjusted R-squared:  0.8572 
## F-statistic: 2.52e+04 on 5 and 20976 DF,  p-value: < 2.2e-16
checkresiduals(lm4)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  Residuals
## LM test = 2984.9, df = 10, p-value < 2.2e-16
##########################################

lm5<-lm(production~dswrf_surface+lag12+tcdc_low.cloud.layer+lag1+tmax+tmin+lag6,data = datapn)
summary(lm5)
## 
## Call:
## lm(formula = production ~ dswrf_surface + lag12 + tcdc_low.cloud.layer + 
##     lag1 + tmax + tmin + lag6, data = datapn)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5202 -0.7429 -0.1304  0.6851  9.4678 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -8.146e-02  4.411e-01  -0.185    0.853    
## dswrf_surface        -7.483e-04  7.485e-05  -9.998  < 2e-16 ***
## lag12                -1.078e-01  3.188e-03 -33.818  < 2e-16 ***
## tcdc_low.cloud.layer -3.444e-03  4.293e-04  -8.024 1.08e-15 ***
## lag1                  8.947e-01  4.962e-03 180.312  < 2e-16 ***
## tmax                  3.895e-02  2.177e-03  17.892  < 2e-16 ***
## tmin                 -3.689e-02  3.387e-03 -10.890  < 2e-16 ***
## lag6                 -1.758e-01  3.475e-03 -50.595  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.38 on 20951 degrees of freedom
##   (155 observations deleted due to missingness)
## Multiple R-squared:  0.8605, Adjusted R-squared:  0.8604 
## F-statistic: 1.846e+04 on 7 and 20951 DF,  p-value: < 2.2e-16
checkresiduals(lm5)

## 
##  Breusch-Godfrey test for serial correlation of order up to 11
## 
## data:  Residuals
## LM test = 5409.6, df = 11, p-value < 2.2e-16
##########################################


lm6<-lm(production~dswrf_surface+lag12+tcdc_low.cloud.layer+lag1+tmax+hoursoftheday+ay,data = datapn)
summary(lm6)
## 
## Call:
## lm(formula = production ~ dswrf_surface + lag12 + tcdc_low.cloud.layer + 
##     lag1 + tmax + hoursoftheday + ay, data = datapn)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.6619 -0.2467  0.0136  0.3726  8.2767 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -2.583e+00  4.986e-01  -5.180 2.24e-07 ***
## dswrf_surface         1.494e-04  8.528e-05   1.752 0.079721 .  
## lag12                -5.812e-02  4.752e-03 -12.230  < 2e-16 ***
## tcdc_low.cloud.layer -4.355e-03  3.728e-04 -11.681  < 2e-16 ***
## lag1                  7.252e-01  5.162e-03 140.470  < 2e-16 ***
## tmax                  1.016e-02  1.759e-03   5.779 7.63e-09 ***
## hoursoftheday1       -2.419e-02  5.358e-02  -0.452 0.651603    
## hoursoftheday2       -8.560e-02  5.400e-02  -1.585 0.112948    
## hoursoftheday3       -1.915e-01  5.579e-02  -3.432 0.000599 ***
## hoursoftheday4       -3.093e-01  5.948e-02  -5.200 2.01e-07 ***
## hoursoftheday5       -3.280e-01  6.304e-02  -5.204 1.97e-07 ***
## hoursoftheday6        1.583e-01  6.501e-02   2.434 0.014925 *  
## hoursoftheday7        1.922e+00  6.496e-02  29.581  < 2e-16 ***
## hoursoftheday8        2.975e+00  6.524e-02  45.594  < 2e-16 ***
## hoursoftheday9        2.911e+00  6.773e-02  42.984  < 2e-16 ***
## hoursoftheday10       2.048e+00  7.225e-02  28.348  < 2e-16 ***
## hoursoftheday11       1.708e+00  7.460e-02  22.892  < 2e-16 ***
## hoursoftheday12       1.540e+00  7.620e-02  20.210  < 2e-16 ***
## hoursoftheday13       1.210e+00  7.719e-02  15.679  < 2e-16 ***
## hoursoftheday14       4.471e-01  7.731e-02   5.783 7.42e-09 ***
## hoursoftheday15      -5.921e-01  7.639e-02  -7.751 9.55e-15 ***
## hoursoftheday16      -1.416e+00  7.055e-02 -20.074  < 2e-16 ***
## hoursoftheday17      -1.367e+00  6.829e-02 -20.010  < 2e-16 ***
## hoursoftheday18      -9.759e-01  6.596e-02 -14.796  < 2e-16 ***
## hoursoftheday19      -3.525e-01  6.097e-02  -5.781 7.54e-09 ***
## hoursoftheday20      -1.435e-01  5.683e-02  -2.525 0.011582 *  
## hoursoftheday21      -3.338e-02  5.543e-02  -0.602 0.546962    
## hoursoftheday22       4.461e-03  5.353e-02   0.083 0.933581    
## hoursoftheday23       7.680e-03  5.353e-02   0.143 0.885933    
## ay10                  5.993e-02  4.955e-02   1.210 0.226422    
## ay11                 -1.988e-02  4.174e-02  -0.476 0.633828    
## ay12                 -2.592e-02  3.832e-02  -0.676 0.498821    
## ay2                   2.221e-01  3.495e-02   6.357 2.10e-10 ***
## ay3                   2.161e-01  3.708e-02   5.827 5.73e-09 ***
## ay4                   1.290e-01  4.639e-02   2.782 0.005415 ** 
## ay5                   1.677e-01  4.947e-02   3.389 0.000702 ***
## ay6                   1.437e-01  5.707e-02   2.517 0.011833 *  
## ay7                   2.175e-01  6.448e-02   3.373 0.000746 ***
## ay8                   8.470e-02  7.320e-02   1.157 0.247253    
## ay9                   1.275e-01  6.277e-02   2.032 0.042206 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.119 on 20919 degrees of freedom
##   (155 observations deleted due to missingness)
## Multiple R-squared:  0.9084, Adjusted R-squared:  0.9082 
## F-statistic:  5318 on 39 and 20919 DF,  p-value: < 2.2e-16
checkresiduals(lm6)

## 
##  Breusch-Godfrey test for serial correlation of order up to 43
## 
## data:  Residuals
## LM test = 2257.3, df = 43, p-value < 2.2e-16
#########################################

lm7<-lm(production~log(dswrf_surface+1)+lag12+log(tcdc_low.cloud.layer+1)+lag1+tmax+hoursoftheday,data = datapn)
summary(lm7)
## 
## Call:
## lm(formula = production ~ log(dswrf_surface + 1) + lag12 + log(tcdc_low.cloud.layer + 
##     1) + lag1 + tmax + hoursoftheday, data = datapn)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5817 -0.2423  0.0149  0.3387  8.2946 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   -1.1869947  0.2458324  -4.828 1.39e-06 ***
## log(dswrf_surface + 1)         0.2479537  0.0143077  17.330  < 2e-16 ***
## lag12                         -0.0467701  0.0045833 -10.204  < 2e-16 ***
## log(tcdc_low.cloud.layer + 1) -0.0684753  0.0057417 -11.926  < 2e-16 ***
## lag1                           0.7122651  0.0049087 145.104  < 2e-16 ***
## tmax                           0.0056814  0.0008251   6.886 5.91e-12 ***
## hoursoftheday1                -0.0190866  0.0533467  -0.358   0.7205    
## hoursoftheday2                -0.0680072  0.0537354  -1.266   0.2057    
## hoursoftheday3                -0.1528892  0.0554064  -2.759   0.0058 ** 
## hoursoftheday4                -0.2565045  0.0589452  -4.352 1.36e-05 ***
## hoursoftheday5                -0.2574354  0.0623012  -4.132 3.61e-05 ***
## hoursoftheday6                 0.1592948  0.0642637   2.479   0.0132 *  
## hoursoftheday7                 1.6671100  0.0666115  25.027  < 2e-16 ***
## hoursoftheday8                 2.4512918  0.0717948  34.143  < 2e-16 ***
## hoursoftheday9                 2.1147454  0.0808276  26.164  < 2e-16 ***
## hoursoftheday10                0.8128840  0.1003606   8.100 5.81e-16 ***
## hoursoftheday11                0.4468876  0.1022329   4.371 1.24e-05 ***
## hoursoftheday12                0.2597592  0.1034482   2.511   0.0120 *  
## hoursoftheday13               -0.0822080  0.1041378  -0.789   0.4299    
## hoursoftheday14               -0.8546406  0.1043941  -8.187 2.84e-16 ***
## hoursoftheday15               -1.9059378  0.1044105 -18.254  < 2e-16 ***
## hoursoftheday16               -2.7015715  0.1015898 -26.593  < 2e-16 ***
## hoursoftheday17               -2.6379205  0.1009382 -26.134  < 2e-16 ***
## hoursoftheday18               -2.2205794  0.0991834 -22.389  < 2e-16 ***
## hoursoftheday19               -1.5786783  0.0949048 -16.634  < 2e-16 ***
## hoursoftheday20               -1.3519092  0.0906125 -14.920  < 2e-16 ***
## hoursoftheday21               -1.2211888  0.0881230 -13.858  < 2e-16 ***
## hoursoftheday22                0.0024854  0.0532986   0.047   0.9628    
## hoursoftheday23                0.0056090  0.0533007   0.105   0.9162    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.114 on 20930 degrees of freedom
##   (155 observations deleted due to missingness)
## Multiple R-squared:  0.9091, Adjusted R-squared:  0.909 
## F-statistic:  7479 on 28 and 20930 DF,  p-value: < 2.2e-16
checkresiduals(lm7)

## 
##  Breusch-Godfrey test for serial correlation of order up to 32
## 
## data:  Residuals
## LM test = 2108.4, df = 32, p-value < 2.2e-16
########################################

lm8<-lm(production~log(dswrf_surface+1)+lag12+season+log(tcdc_low.cloud.layer+1)+lag1+tmax+hoursoftheday,data = datapn)
summary(lm8)
## 
## Call:
## lm(formula = production ~ log(dswrf_surface + 1) + lag12 + season + 
##     log(tcdc_low.cloud.layer + 1) + lag1 + tmax + hoursoftheday, 
##     data = datapn)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.569 -0.241  0.014  0.343  8.286 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   -2.045279   0.389683  -5.249 1.55e-07 ***
## log(dswrf_surface + 1)         0.245992   0.015345  16.031  < 2e-16 ***
## lag12                         -0.048349   0.004633 -10.435  < 2e-16 ***
## season2                       -0.104497   0.030571  -3.418 0.000631 ***
## season3                       -0.119869   0.043065  -2.783 0.005383 ** 
## season4                       -0.098289   0.025379  -3.873 0.000108 ***
## log(tcdc_low.cloud.layer + 1) -0.067281   0.006038 -11.143  < 2e-16 ***
## lag1                           0.710898   0.004918 144.558  < 2e-16 ***
## tmax                           0.008866   0.001362   6.511 7.62e-11 ***
## hoursoftheday1                -0.019742   0.053329  -0.370 0.711241    
## hoursoftheday2                -0.070340   0.053726  -1.309 0.190469    
## hoursoftheday3                -0.158095   0.055432  -2.852 0.004348 ** 
## hoursoftheday4                -0.265027   0.059053  -4.488 7.23e-06 ***
## hoursoftheday5                -0.268403   0.062477  -4.296 1.75e-05 ***
## hoursoftheday6                 0.147833   0.064363   2.297 0.021636 *  
## hoursoftheday7                 1.658424   0.066689  24.868  < 2e-16 ***
## hoursoftheday8                 2.447971   0.072430  33.798  < 2e-16 ***
## hoursoftheday9                 2.117520   0.082505  25.665  < 2e-16 ***
## hoursoftheday10                0.822343   0.104184   7.893 3.09e-15 ***
## hoursoftheday11                0.457408   0.106260   4.305 1.68e-05 ***
## hoursoftheday12                0.270546   0.107626   2.514 0.011953 *  
## hoursoftheday13               -0.071516   0.108424  -0.660 0.509522    
## hoursoftheday14               -0.844527   0.108744  -7.766 8.46e-15 ***
## hoursoftheday15               -1.897353   0.108776 -17.443  < 2e-16 ***
## hoursoftheday16               -2.696327   0.105670 -25.517  < 2e-16 ***
## hoursoftheday17               -2.635966   0.104760 -25.162  < 2e-16 ***
## hoursoftheday18               -2.220310   0.102805 -21.597  < 2e-16 ***
## hoursoftheday19               -1.576474   0.098668 -15.978  < 2e-16 ***
## hoursoftheday20               -1.346029   0.094648 -14.221  < 2e-16 ***
## hoursoftheday21               -1.212698   0.092295 -13.139  < 2e-16 ***
## hoursoftheday22                0.002548   0.053280   0.048 0.961858    
## hoursoftheday23                0.005762   0.053282   0.108 0.913885    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.113 on 20927 degrees of freedom
##   (155 observations deleted due to missingness)
## Multiple R-squared:  0.9092, Adjusted R-squared:  0.9091 
## F-statistic:  6760 on 31 and 20927 DF,  p-value: < 2.2e-16
checkresiduals(lm8)

## 
##  Breusch-Godfrey test for serial correlation of order up to 35
## 
## data:  Residuals
## LM test = 2131.4, df = 35, p-value < 2.2e-16
##########################################
lm9<-lm(production~log(dswrf_surface+1)+lag12+season+log(tcdc_low.cloud.layer+1)+tmax+hoursoftheday+trend+lag1,data = datapn)
summary(lm9)
## 
## Call:
## lm(formula = production ~ log(dswrf_surface + 1) + lag12 + season + 
##     log(tcdc_low.cloud.layer + 1) + tmax + hoursoftheday + trend + 
##     lag1, data = datapn)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5650 -0.2418  0.0134  0.3424  8.2762 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   -2.137e+00  3.951e-01  -5.410 6.39e-08 ***
## log(dswrf_surface + 1)         2.445e-01  1.538e-02  15.891  < 2e-16 ***
## lag12                         -4.866e-02  4.638e-03 -10.490  < 2e-16 ***
## season2                       -1.090e-01  3.074e-02  -3.547 0.000390 ***
## season3                       -1.305e-01  4.371e-02  -2.985 0.002842 ** 
## season4                       -9.835e-02  2.538e-02  -3.875 0.000107 ***
## log(tcdc_low.cloud.layer + 1) -6.721e-02  6.038e-03 -11.132  < 2e-16 ***
## tmax                           9.261e-03  1.390e-03   6.662 2.76e-11 ***
## hoursoftheday1                -1.987e-02  5.333e-02  -0.373 0.709472    
## hoursoftheday2                -7.079e-02  5.373e-02  -1.318 0.187632    
## hoursoftheday3                -1.591e-01  5.544e-02  -2.870 0.004110 ** 
## hoursoftheday4                -2.667e-01  5.906e-02  -4.516 6.35e-06 ***
## hoursoftheday5                -2.706e-01  6.249e-02  -4.329 1.50e-05 ***
## hoursoftheday6                 1.459e-01  6.438e-02   2.267 0.023398 *  
## hoursoftheday7                 1.658e+00  6.669e-02  24.866  < 2e-16 ***
## hoursoftheday8                 2.450e+00  7.244e-02  33.819  < 2e-16 ***
## hoursoftheday9                 2.122e+00  8.256e-02  25.700  < 2e-16 ***
## hoursoftheday10                8.300e-01  1.043e-01   7.956 1.86e-15 ***
## hoursoftheday11                4.654e-01  1.064e-01   4.374 1.23e-05 ***
## hoursoftheday12                2.788e-01  1.078e-01   2.586 0.009704 ** 
## hoursoftheday13               -6.320e-02  1.086e-01  -0.582 0.560513    
## hoursoftheday14               -8.362e-01  1.089e-01  -7.679 1.68e-14 ***
## hoursoftheday15               -1.889e+00  1.089e-01 -17.344  < 2e-16 ***
## hoursoftheday16               -2.689e+00  1.058e-01 -25.416  < 2e-16 ***
## hoursoftheday17               -2.629e+00  1.049e-01 -25.071  < 2e-16 ***
## hoursoftheday18               -2.214e+00  1.029e-01 -21.515  < 2e-16 ***
## hoursoftheday19               -1.570e+00  9.878e-02 -15.893  < 2e-16 ***
## hoursoftheday20               -1.339e+00  9.478e-02 -14.127  < 2e-16 ***
## hoursoftheday21               -1.205e+00  9.244e-02 -13.039  < 2e-16 ***
## hoursoftheday22                2.587e-03  5.328e-02   0.049 0.961275    
## hoursoftheday23                5.821e-03  5.328e-02   0.109 0.913002    
## trend                         -1.854e-06  1.313e-06  -1.412 0.157892    
## lag1                           7.108e-01  4.919e-03 144.508  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.113 on 20926 degrees of freedom
##   (155 observations deleted due to missingness)
## Multiple R-squared:  0.9092, Adjusted R-squared:  0.9091 
## F-statistic:  6549 on 32 and 20926 DF,  p-value: < 2.2e-16
checkresiduals(lm9)

## 
##  Breusch-Godfrey test for serial correlation of order up to 36
## 
## data:  Residuals
## LM test = 2134.5, df = 36, p-value < 2.2e-16
##############################################

lm10<-lm(production~log(dswrf_surface+1)+lag12+season+log(tcdc_low.cloud.layer+1)+tmax+hoursoftheday+trend+lag1,data = datapn)
summary(lm10)
## 
## Call:
## lm(formula = production ~ log(dswrf_surface + 1) + lag12 + season + 
##     log(tcdc_low.cloud.layer + 1) + tmax + hoursoftheday + trend + 
##     lag1, data = datapn)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5650 -0.2418  0.0134  0.3424  8.2762 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   -2.137e+00  3.951e-01  -5.410 6.39e-08 ***
## log(dswrf_surface + 1)         2.445e-01  1.538e-02  15.891  < 2e-16 ***
## lag12                         -4.866e-02  4.638e-03 -10.490  < 2e-16 ***
## season2                       -1.090e-01  3.074e-02  -3.547 0.000390 ***
## season3                       -1.305e-01  4.371e-02  -2.985 0.002842 ** 
## season4                       -9.835e-02  2.538e-02  -3.875 0.000107 ***
## log(tcdc_low.cloud.layer + 1) -6.721e-02  6.038e-03 -11.132  < 2e-16 ***
## tmax                           9.261e-03  1.390e-03   6.662 2.76e-11 ***
## hoursoftheday1                -1.987e-02  5.333e-02  -0.373 0.709472    
## hoursoftheday2                -7.079e-02  5.373e-02  -1.318 0.187632    
## hoursoftheday3                -1.591e-01  5.544e-02  -2.870 0.004110 ** 
## hoursoftheday4                -2.667e-01  5.906e-02  -4.516 6.35e-06 ***
## hoursoftheday5                -2.706e-01  6.249e-02  -4.329 1.50e-05 ***
## hoursoftheday6                 1.459e-01  6.438e-02   2.267 0.023398 *  
## hoursoftheday7                 1.658e+00  6.669e-02  24.866  < 2e-16 ***
## hoursoftheday8                 2.450e+00  7.244e-02  33.819  < 2e-16 ***
## hoursoftheday9                 2.122e+00  8.256e-02  25.700  < 2e-16 ***
## hoursoftheday10                8.300e-01  1.043e-01   7.956 1.86e-15 ***
## hoursoftheday11                4.654e-01  1.064e-01   4.374 1.23e-05 ***
## hoursoftheday12                2.788e-01  1.078e-01   2.586 0.009704 ** 
## hoursoftheday13               -6.320e-02  1.086e-01  -0.582 0.560513    
## hoursoftheday14               -8.362e-01  1.089e-01  -7.679 1.68e-14 ***
## hoursoftheday15               -1.889e+00  1.089e-01 -17.344  < 2e-16 ***
## hoursoftheday16               -2.689e+00  1.058e-01 -25.416  < 2e-16 ***
## hoursoftheday17               -2.629e+00  1.049e-01 -25.071  < 2e-16 ***
## hoursoftheday18               -2.214e+00  1.029e-01 -21.515  < 2e-16 ***
## hoursoftheday19               -1.570e+00  9.878e-02 -15.893  < 2e-16 ***
## hoursoftheday20               -1.339e+00  9.478e-02 -14.127  < 2e-16 ***
## hoursoftheday21               -1.205e+00  9.244e-02 -13.039  < 2e-16 ***
## hoursoftheday22                2.587e-03  5.328e-02   0.049 0.961275    
## hoursoftheday23                5.821e-03  5.328e-02   0.109 0.913002    
## trend                         -1.854e-06  1.313e-06  -1.412 0.157892    
## lag1                           7.108e-01  4.919e-03 144.508  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.113 on 20926 degrees of freedom
##   (155 observations deleted due to missingness)
## Multiple R-squared:  0.9092, Adjusted R-squared:  0.9091 
## F-statistic:  6549 on 32 and 20926 DF,  p-value: < 2.2e-16
checkresiduals(lm10)

## 
##  Breusch-Godfrey test for serial correlation of order up to 36
## 
## data:  Residuals
## LM test = 2134.5, df = 36, p-value < 2.2e-16
##############################################

lm11<-lm(production~log(dswrf_surface+1)+lag12+season+log(tcdc_low.cloud.layer+1)+tmax+hoursoftheday+trend+lag2+lag1+lag1dswrf,data = datapn)
summary(lm11)
## 
## Call:
## lm(formula = production ~ log(dswrf_surface + 1) + lag12 + season + 
##     log(tcdc_low.cloud.layer + 1) + tmax + hoursoftheday + trend + 
##     lag2 + lag1 + lag1dswrf, data = datapn)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5464 -0.2451  0.0155  0.3504  8.7715 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   -2.333e+00  3.969e-01  -5.879 4.20e-09 ***
## log(dswrf_surface + 1)         2.520e-01  1.564e-02  16.109  < 2e-16 ***
## lag12                         -5.193e-02  4.628e-03 -11.222  < 2e-16 ***
## season2                       -1.119e-01  3.081e-02  -3.633 0.000281 ***
## season3                       -1.395e-01  4.357e-02  -3.202 0.001367 ** 
## season4                       -1.108e-01  2.556e-02  -4.336 1.46e-05 ***
## log(tcdc_low.cloud.layer + 1) -7.517e-02  6.050e-03 -12.424  < 2e-16 ***
## tmax                           1.009e-02  1.397e-03   7.221 5.35e-13 ***
## hoursoftheday1                -2.119e-02  5.311e-02  -0.399 0.689831    
## hoursoftheday2                -7.552e-02  5.350e-02  -1.411 0.158123    
## hoursoftheday3                -1.698e-01  5.521e-02  -3.075 0.002110 ** 
## hoursoftheday4                -2.853e-01  5.884e-02  -4.849 1.25e-06 ***
## hoursoftheday5                -2.946e-01  6.227e-02  -4.730 2.26e-06 ***
## hoursoftheday6                 1.126e-01  6.419e-02   1.754 0.079501 .  
## hoursoftheday7                 1.585e+00  6.687e-02  23.700  < 2e-16 ***
## hoursoftheday8                 2.276e+00  7.395e-02  30.774  < 2e-16 ***
## hoursoftheday9                 1.963e+00  8.456e-02  23.211  < 2e-16 ***
## hoursoftheday10                7.726e-01  1.067e-01   7.241 4.62e-13 ***
## hoursoftheday11                5.403e-01  1.061e-01   5.090 3.61e-07 ***
## hoursoftheday12                3.983e-01  1.079e-01   3.691 0.000224 ***
## hoursoftheday13                6.960e-02  1.091e-01   0.638 0.523557    
## hoursoftheday14               -6.866e-01  1.099e-01  -6.249 4.21e-10 ***
## hoursoftheday15               -1.706e+00  1.104e-01 -15.451  < 2e-16 ***
## hoursoftheday16               -2.479e+00  1.081e-01 -22.940  < 2e-16 ***
## hoursoftheday17               -2.438e+00  1.059e-01 -23.032  < 2e-16 ***
## hoursoftheday18               -2.115e+00  1.031e-01 -20.516  < 2e-16 ***
## hoursoftheday19               -1.552e+00  9.861e-02 -15.739  < 2e-16 ***
## hoursoftheday20               -1.376e+00  9.452e-02 -14.562  < 2e-16 ***
## hoursoftheday21               -1.242e+00  9.211e-02 -13.483  < 2e-16 ***
## hoursoftheday22                2.633e-03  5.480e-02   0.048 0.961670    
## hoursoftheday23                6.222e-03  5.306e-02   0.117 0.906658    
## trend                         -2.079e-06  1.310e-06  -1.587 0.112606    
## lag2                          -9.106e-02  7.130e-03 -12.771  < 2e-16 ***
## lag1                           7.769e-01  6.982e-03 111.268  < 2e-16 ***
## lag1dswrf                      7.319e-07  8.251e-05   0.009 0.992922    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.109 on 20924 degrees of freedom
##   (155 observations deleted due to missingness)
## Multiple R-squared:   0.91,  Adjusted R-squared:  0.9098 
## F-statistic:  6221 on 34 and 20924 DF,  p-value: < 2.2e-16
checkresiduals(lm11)

## 
##  Breusch-Godfrey test for serial correlation of order up to 38
## 
## data:  Residuals
## LM test = 2080, df = 38, p-value < 2.2e-16
##############################################

lm12<-lm(production~dswrf_surface+tmax+tcdc_entire.atmosphere+hoursoftheday+lag1+lag24+lag23+lag25+hafta+ay,data = datapn)
summary(lm12)
## 
## Call:
## lm(formula = production ~ dswrf_surface + tmax + tcdc_entire.atmosphere + 
##     hoursoftheday + lag1 + lag24 + lag23 + lag25 + hafta + ay, 
##     data = datapn)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.8056 -0.1944  0.0198  0.2923  8.3910 
## 
## Coefficients: (1 not defined because of singularities)
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            -2.057e+00  4.947e-01  -4.158 3.22e-05 ***
## dswrf_surface           1.927e-04  8.251e-05   2.335 0.019529 *  
## tmax                    7.489e-03  1.724e-03   4.343 1.41e-05 ***
## tcdc_entire.atmosphere -3.386e-03  2.610e-04 -12.973  < 2e-16 ***
## hoursoftheday1         -1.162e-03  5.140e-02  -0.023 0.981961    
## hoursoftheday2         -1.902e-03  5.140e-02  -0.037 0.970480    
## hoursoftheday3         -2.810e-03  5.140e-02  -0.055 0.956395    
## hoursoftheday4         -6.820e-03  5.140e-02  -0.133 0.894445    
## hoursoftheday5         -8.457e-03  5.154e-02  -0.164 0.869665    
## hoursoftheday6          2.035e-01  5.378e-02   3.785 0.000154 ***
## hoursoftheday7          1.398e+00  5.789e-02  24.155  < 2e-16 ***
## hoursoftheday8          2.152e+00  6.140e-02  35.046  < 2e-16 ***
## hoursoftheday9          2.145e+00  6.455e-02  33.237  < 2e-16 ***
## hoursoftheday10         1.473e+00  6.994e-02  21.058  < 2e-16 ***
## hoursoftheday11         1.224e+00  7.226e-02  16.933  < 2e-16 ***
## hoursoftheday12         1.127e+00  7.344e-02  15.345  < 2e-16 ***
## hoursoftheday13         9.586e-01  7.352e-02  13.038  < 2e-16 ***
## hoursoftheday14         5.036e-01  7.265e-02   6.931 4.29e-12 ***
## hoursoftheday15        -1.609e-01  7.104e-02  -2.265 0.023534 *  
## hoursoftheday16        -7.260e-01  6.380e-02 -11.380  < 2e-16 ***
## hoursoftheday17        -6.895e-01  5.963e-02 -11.563  < 2e-16 ***
## hoursoftheday18        -4.457e-01  5.702e-02  -7.817 5.66e-15 ***
## hoursoftheday19        -6.425e-02  5.537e-02  -1.160 0.245933    
## hoursoftheday20        -1.746e-02  5.407e-02  -0.323 0.746766    
## hoursoftheday21        -1.292e-02  5.327e-02  -0.243 0.808371    
## hoursoftheday22         3.497e-03  5.140e-02   0.068 0.945758    
## hoursoftheday23         1.715e-03  5.138e-02   0.033 0.973379    
## lag1                    6.975e-01  5.250e-03 132.846  < 2e-16 ***
## lag24                   1.819e-01  8.513e-03  21.370  < 2e-16 ***
## lag23                   1.028e-01  6.615e-03  15.536  < 2e-16 ***
## lag25                  -1.436e-01  6.822e-03 -21.054  < 2e-16 ***
## hafta10                -6.158e-02  1.531e-01  -0.402 0.687528    
## hafta11                 1.610e-03  1.534e-01   0.010 0.991626    
## hafta12                 4.108e-02  1.530e-01   0.269 0.788294    
## hafta13                -2.343e-02  1.545e-01  -0.152 0.879496    
## hafta14                -1.733e-01  2.245e-01  -0.772 0.440132    
## hafta15                -1.131e-01  2.240e-01  -0.505 0.613539    
## hafta16                -1.448e-01  2.251e-01  -0.643 0.520111    
## hafta17                -1.732e-01  2.254e-01  -0.768 0.442347    
## hafta18                -1.202e-01  2.457e-01  -0.489 0.624639    
## hafta19                -1.484e-01  2.564e-01  -0.579 0.562755    
## hafta2                  4.966e-02  6.881e-02   0.722 0.470478    
## hafta20                -1.045e-01  2.567e-01  -0.407 0.684012    
## hafta21                -1.840e-01  2.575e-01  -0.715 0.474866    
## hafta22                -2.169e-01  2.647e-01  -0.819 0.412545    
## hafta23                -2.744e-01  2.852e-01  -0.962 0.335904    
## hafta24                -2.985e-01  2.850e-01  -1.047 0.294902    
## hafta25                -3.241e-01  2.848e-01  -1.138 0.255169    
## hafta26                -2.982e-01  2.860e-01  -1.043 0.297076    
## hafta27                -2.354e-01  3.313e-01  -0.711 0.477378    
## hafta28                -1.763e-01  3.308e-01  -0.533 0.594179    
## hafta29                -2.357e-01  3.314e-01  -0.711 0.476950    
## hafta3                  3.509e-02  6.876e-02   0.510 0.609803    
## hafta30                -2.464e-01  3.320e-01  -0.742 0.457951    
## hafta31                -3.358e-01  3.451e-01  -0.973 0.330504    
## hafta32                -3.453e-01  3.570e-01  -0.967 0.333367    
## hafta33                -3.754e-01  3.571e-01  -1.051 0.293242    
## hafta34                -3.537e-01  3.566e-01  -0.992 0.321279    
## hafta35                -3.502e-01  3.584e-01  -0.977 0.328456    
## hafta36                -2.861e-01  3.783e-01  -0.756 0.449495    
## hafta37                -2.526e-01  3.780e-01  -0.668 0.503850    
## hafta38                -2.411e-01  3.781e-01  -0.638 0.523704    
## hafta39                -2.589e-01  3.775e-01  -0.686 0.492857    
## hafta4                 -7.709e-02  6.927e-02  -1.113 0.265751    
## hafta40                -1.868e-01  1.959e-01  -0.953 0.340459    
## hafta41                -1.593e-01  1.951e-01  -0.816 0.414361    
## hafta42                -1.933e-01  1.942e-01  -0.995 0.319555    
## hafta43                -1.556e-01  1.947e-01  -0.799 0.424320    
## hafta44                -1.323e-01  1.623e-01  -0.815 0.414964    
## hafta45                -1.528e-01  1.535e-01  -0.996 0.319492    
## hafta46                -1.426e-01  1.527e-01  -0.934 0.350387    
## hafta47                -8.786e-02  1.516e-01  -0.580 0.562105    
## hafta48                 5.671e-03  1.207e-01   0.047 0.962537    
## hafta49                 2.541e-02  7.708e-02   0.330 0.741655    
## hafta5                  2.341e-02  8.886e-02   0.263 0.792242    
## hafta50                 3.814e-02  7.721e-02   0.494 0.621293    
## hafta51                -8.684e-02  7.659e-02  -1.134 0.256834    
## hafta52                -5.098e-02  7.680e-02  -0.664 0.506810    
## hafta53                 5.969e-02  1.626e-01   0.367 0.713544    
## hafta6                  1.676e-02  1.190e-01   0.141 0.887974    
## hafta7                 -1.727e-02  1.190e-01  -0.145 0.884548    
## hafta8                 -1.862e-02  1.190e-01  -0.156 0.875691    
## hafta9                 -5.735e-02  1.300e-01  -0.441 0.659042    
## ay10                    1.250e-01  1.759e-01   0.710 0.477593    
## ay11                    7.202e-02  1.302e-01   0.553 0.580088    
## ay12                           NA         NA      NA       NA    
## ay2                     1.325e-01  9.683e-02   1.368 0.171292    
## ay3                     1.254e-01  1.363e-01   0.920 0.357524    
## ay4                     1.801e-01  2.125e-01   0.848 0.396589    
## ay5                     1.607e-01  2.448e-01   0.657 0.511495    
## ay6                     2.527e-01  2.719e-01   0.930 0.352635    
## ay7                     1.249e-01  3.193e-01   0.391 0.695561    
## ay8                     1.768e-01  3.446e-01   0.513 0.607844    
## ay9                     1.474e-01  3.682e-01   0.400 0.688971    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.073 on 20852 degrees of freedom
##   (169 observations deleted due to missingness)
## Multiple R-squared:  0.9159, Adjusted R-squared:  0.9156 
## F-statistic:  2469 on 92 and 20852 DF,  p-value: < 2.2e-16
checkresiduals(lm12)

## 
##  Breusch-Godfrey test for serial correlation of order up to 97
## 
## data:  Residuals
## LM test = 1354.2, df = 97, p-value < 2.2e-16
##############################################

lm13<-lm(production~dswrf_surface+tmp_surface+tcdc_entire.atmosphere+lag73+lag72+lag71+lag48+lag47+lag49+ay+hoursoftheday+hafta,data = datapn)
summary(lm13)
## 
## Call:
## lm(formula = production ~ dswrf_surface + tmp_surface + tcdc_entire.atmosphere + 
##     lag73 + lag72 + lag71 + lag48 + lag47 + lag49 + ay + hoursoftheday + 
##     hafta, data = datapn)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.1546 -0.4938  0.0334  0.6243  8.1348 
## 
## Coefficients: (1 not defined because of singularities)
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            -1.332e+01  7.362e-01 -18.095  < 2e-16 ***
## dswrf_surface           3.363e-03  1.196e-04  28.124  < 2e-16 ***
## tmp_surface             5.118e-02  2.696e-03  18.983  < 2e-16 ***
## tcdc_entire.atmosphere -1.133e-02  3.547e-04 -31.941  < 2e-16 ***
## lag73                  -2.727e-02  9.588e-03  -2.844 0.004456 ** 
## lag72                   1.253e-01  1.197e-02  10.469  < 2e-16 ***
## lag71                   2.693e-02  9.548e-03   2.821 0.004794 ** 
## lag48                   1.744e-01  1.197e-02  14.568  < 2e-16 ***
## lag47                   4.096e-02  9.545e-03   4.292 1.78e-05 ***
## lag49                  -3.237e-02  9.635e-03  -3.359 0.000784 ***
## ay10                    8.955e-01  3.333e-01   2.687 0.007214 ** 
## ay11                    5.753e-01  2.898e-01   1.985 0.047150 *  
## ay12                    4.499e-01  2.267e-01   1.984 0.047218 *  
## ay2                     5.847e-01  1.342e-01   4.357 1.32e-05 ***
## ay3                     5.206e-01  1.890e-01   2.755 0.005870 ** 
## ay4                     8.294e-01  2.947e-01   2.814 0.004894 ** 
## ay5                     9.637e-01  3.396e-01   2.838 0.004543 ** 
## ay6                     1.058e+00  3.771e-01   2.806 0.005016 ** 
## ay7                     7.699e-01  4.430e-01   1.738 0.082249 .  
## ay8                     8.581e-01  4.781e-01   1.795 0.072719 .  
## ay9                     7.707e-01  5.109e-01   1.509 0.131388    
## hoursoftheday1          1.637e-02  7.134e-02   0.229 0.818550    
## hoursoftheday2          3.317e-02  7.136e-02   0.465 0.642119    
## hoursoftheday3          4.943e-02  7.140e-02   0.692 0.488769    
## hoursoftheday4          6.383e-02  7.145e-02   0.893 0.371687    
## hoursoftheday5          9.228e-02  7.183e-02   1.285 0.198882    
## hoursoftheday6          3.553e-01  7.660e-02   4.639 3.53e-06 ***
## hoursoftheday7          1.638e+00  8.430e-02  19.431  < 2e-16 ***
## hoursoftheday8          3.255e+00  8.944e-02  36.394  < 2e-16 ***
## hoursoftheday9          4.324e+00  9.185e-02  47.075  < 2e-16 ***
## hoursoftheday10         3.344e+00  1.002e-01  33.373  < 2e-16 ***
## hoursoftheday11         3.042e+00  1.031e-01  29.499  < 2e-16 ***
## hoursoftheday12         2.716e+00  1.046e-01  25.976  < 2e-16 ***
## hoursoftheday13         2.350e+00  1.043e-01  22.530  < 2e-16 ***
## hoursoftheday14         1.690e+00  1.028e-01  16.439  < 2e-16 ***
## hoursoftheday15         6.165e-01  1.006e-01   6.130 8.94e-10 ***
## hoursoftheday16        -1.930e-01  9.007e-02  -2.143 0.032141 *  
## hoursoftheday17        -9.460e-01  8.335e-02 -11.350  < 2e-16 ***
## hoursoftheday18        -1.167e+00  7.894e-02 -14.779  < 2e-16 ***
## hoursoftheday19        -9.316e-01  7.645e-02 -12.185  < 2e-16 ***
## hoursoftheday20        -7.046e-01  7.488e-02  -9.410  < 2e-16 ***
## hoursoftheday21        -5.683e-01  7.388e-02  -7.693 1.50e-14 ***
## hoursoftheday22        -3.435e-02  7.137e-02  -0.481 0.630327    
## hoursoftheday23        -1.664e-02  7.132e-02  -0.233 0.815535    
## hafta10                -6.244e-01  2.131e-01  -2.930 0.003392 ** 
## hafta11                -3.161e-01  2.133e-01  -1.482 0.138285    
## hafta12                -1.739e-01  2.134e-01  -0.815 0.414993    
## hafta13                -5.574e-01  2.145e-01  -2.599 0.009349 ** 
## hafta14                -1.367e+00  3.111e-01  -4.395 1.11e-05 ***
## hafta15                -1.108e+00  3.109e-01  -3.565 0.000365 ***
## hafta16                -1.335e+00  3.116e-01  -4.286 1.82e-05 ***
## hafta17                -1.450e+00  3.118e-01  -4.650 3.34e-06 ***
## hafta18                -1.366e+00  3.403e-01  -4.014 6.00e-05 ***
## hafta19                -1.573e+00  3.546e-01  -4.436 9.22e-06 ***
## hafta2                 -1.306e-02  9.814e-02  -0.133 0.894125    
## hafta20                -1.451e+00  3.549e-01  -4.088 4.38e-05 ***
## hafta21                -1.696e+00  3.559e-01  -4.767 1.88e-06 ***
## hafta22                -1.854e+00  3.655e-01  -5.071 3.99e-07 ***
## hafta23                -1.891e+00  3.943e-01  -4.797 1.63e-06 ***
## hafta24                -1.997e+00  3.943e-01  -5.066 4.09e-07 ***
## hafta25                -2.130e+00  3.943e-01  -5.401 6.70e-08 ***
## hafta26                -2.041e+00  3.959e-01  -5.155 2.56e-07 ***
## hafta27                -2.012e+00  4.582e-01  -4.391 1.13e-05 ***
## hafta28                -1.760e+00  4.579e-01  -3.843 0.000122 ***
## hafta29                -1.982e+00  4.582e-01  -4.326 1.53e-05 ***
## hafta3                  1.372e-01  9.824e-02   1.397 0.162402    
## hafta30                -2.004e+00  4.585e-01  -4.372 1.24e-05 ***
## hafta31                -2.261e+00  4.764e-01  -4.745 2.10e-06 ***
## hafta32                -2.241e+00  4.928e-01  -4.547 5.46e-06 ***
## hafta33                -2.319e+00  4.928e-01  -4.706 2.54e-06 ***
## hafta34                -2.177e+00  4.923e-01  -4.421 9.87e-06 ***
## hafta35                -2.135e+00  4.949e-01  -4.313 1.61e-05 ***
## hafta36                -1.867e+00  5.233e-01  -3.568 0.000361 ***
## hafta37                -1.647e+00  5.229e-01  -3.150 0.001634 ** 
## hafta38                -1.613e+00  5.230e-01  -3.083 0.002050 ** 
## hafta39                -1.611e+00  5.229e-01  -3.081 0.002063 ** 
## hafta4                 -4.070e-01  9.903e-02  -4.109 3.98e-05 ***
## hafta40                -1.549e+00  3.383e-01  -4.578 4.72e-06 ***
## hafta41                -1.413e+00  3.376e-01  -4.186 2.84e-05 ***
## hafta42                -1.594e+00  3.372e-01  -4.727 2.29e-06 ***
## hafta43                -1.215e+00  3.374e-01  -3.601 0.000317 ***
## hafta44                -1.081e+00  3.025e-01  -3.572 0.000355 ***
## hafta45                -1.148e+00  2.940e-01  -3.906 9.43e-05 ***
## hafta46                -9.236e-01  2.932e-01  -3.150 0.001637 ** 
## hafta47                -8.388e-01  2.925e-01  -2.868 0.004141 ** 
## hafta48                -5.179e-01  2.638e-01  -1.963 0.049624 *  
## hafta49                -4.836e-01  2.301e-01  -2.102 0.035605 *  
## hafta5                 -2.655e-02  1.260e-01  -0.211 0.833073    
## hafta50                -5.061e-01  2.304e-01  -2.197 0.028048 *  
## hafta51                -9.263e-01  2.299e-01  -4.030 5.60e-05 ***
## hafta52                -5.683e-01  2.300e-01  -2.471 0.013497 *  
## hafta53                        NA         NA      NA       NA    
## hafta6                 -1.652e-01  1.672e-01  -0.988 0.323254    
## hafta7                 -2.374e-01  1.672e-01  -1.419 0.155783    
## hafta8                 -3.513e-01  1.664e-01  -2.111 0.034802 *  
## hafta9                 -6.465e-01  1.806e-01  -3.579 0.000345 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.489 on 20825 degrees of freedom
##   (194 observations deleted due to missingness)
## Multiple R-squared:  0.8384, Adjusted R-squared:  0.8377 
## F-statistic:  1149 on 94 and 20825 DF,  p-value: < 2.2e-16
checkresiduals(lm13)

## 
##  Breusch-Godfrey test for serial correlation of order up to 99
## 
## data:  Residuals
## LM test = 10694, df = 99, p-value < 2.2e-16

As we continue our search, we found that model12 has a great R^2 value which would meet our expactations. But since model12 works on lag1 and our aim is to find 48-hours later production, we construct a very similar model13. This model is the best candidate for our aim.

After evaluating multiple models, we chose Model 13 (lm13) for our final analysis. This decision was based on its superior performance in capturing the key factors influencing solar power production. Model 13 incorporates a comprehensive set of variables, including downward shortwave radiation flux (dswrf_surface), surface temperature (tmp_surface), total cloud cover (tcdc_entire.atmosphere), and several lagged production values (e.g., lag73, lag72, lag71, lag48, lag47, and lag49). Additionally, it includes categorical time features such as the hour of the day (hoursoftheday), month (ay), and week of the year (hafta). This model’s detailed consideration of both immediate and longer-term lagged production values, along with its incorporation of important weather variables, allows it to more accurately capture the complexities of solar power production, making it the most robust and reliable choice for our forecasting needs.

Results

lm13<-lm(production~dswrf_surface+tmp_surface+tcdc_entire.atmosphere+lag73+lag72+lag71+lag48+lag47+lag49+ay+hoursoftheday+hafta,data = datapn)
summary(lm13)
## 
## Call:
## lm(formula = production ~ dswrf_surface + tmp_surface + tcdc_entire.atmosphere + 
##     lag73 + lag72 + lag71 + lag48 + lag47 + lag49 + ay + hoursoftheday + 
##     hafta, data = datapn)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.1546 -0.4938  0.0334  0.6243  8.1348 
## 
## Coefficients: (1 not defined because of singularities)
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            -1.332e+01  7.362e-01 -18.095  < 2e-16 ***
## dswrf_surface           3.363e-03  1.196e-04  28.124  < 2e-16 ***
## tmp_surface             5.118e-02  2.696e-03  18.983  < 2e-16 ***
## tcdc_entire.atmosphere -1.133e-02  3.547e-04 -31.941  < 2e-16 ***
## lag73                  -2.727e-02  9.588e-03  -2.844 0.004456 ** 
## lag72                   1.253e-01  1.197e-02  10.469  < 2e-16 ***
## lag71                   2.693e-02  9.548e-03   2.821 0.004794 ** 
## lag48                   1.744e-01  1.197e-02  14.568  < 2e-16 ***
## lag47                   4.096e-02  9.545e-03   4.292 1.78e-05 ***
## lag49                  -3.237e-02  9.635e-03  -3.359 0.000784 ***
## ay10                    8.955e-01  3.333e-01   2.687 0.007214 ** 
## ay11                    5.753e-01  2.898e-01   1.985 0.047150 *  
## ay12                    4.499e-01  2.267e-01   1.984 0.047218 *  
## ay2                     5.847e-01  1.342e-01   4.357 1.32e-05 ***
## ay3                     5.206e-01  1.890e-01   2.755 0.005870 ** 
## ay4                     8.294e-01  2.947e-01   2.814 0.004894 ** 
## ay5                     9.637e-01  3.396e-01   2.838 0.004543 ** 
## ay6                     1.058e+00  3.771e-01   2.806 0.005016 ** 
## ay7                     7.699e-01  4.430e-01   1.738 0.082249 .  
## ay8                     8.581e-01  4.781e-01   1.795 0.072719 .  
## ay9                     7.707e-01  5.109e-01   1.509 0.131388    
## hoursoftheday1          1.637e-02  7.134e-02   0.229 0.818550    
## hoursoftheday2          3.317e-02  7.136e-02   0.465 0.642119    
## hoursoftheday3          4.943e-02  7.140e-02   0.692 0.488769    
## hoursoftheday4          6.383e-02  7.145e-02   0.893 0.371687    
## hoursoftheday5          9.228e-02  7.183e-02   1.285 0.198882    
## hoursoftheday6          3.553e-01  7.660e-02   4.639 3.53e-06 ***
## hoursoftheday7          1.638e+00  8.430e-02  19.431  < 2e-16 ***
## hoursoftheday8          3.255e+00  8.944e-02  36.394  < 2e-16 ***
## hoursoftheday9          4.324e+00  9.185e-02  47.075  < 2e-16 ***
## hoursoftheday10         3.344e+00  1.002e-01  33.373  < 2e-16 ***
## hoursoftheday11         3.042e+00  1.031e-01  29.499  < 2e-16 ***
## hoursoftheday12         2.716e+00  1.046e-01  25.976  < 2e-16 ***
## hoursoftheday13         2.350e+00  1.043e-01  22.530  < 2e-16 ***
## hoursoftheday14         1.690e+00  1.028e-01  16.439  < 2e-16 ***
## hoursoftheday15         6.165e-01  1.006e-01   6.130 8.94e-10 ***
## hoursoftheday16        -1.930e-01  9.007e-02  -2.143 0.032141 *  
## hoursoftheday17        -9.460e-01  8.335e-02 -11.350  < 2e-16 ***
## hoursoftheday18        -1.167e+00  7.894e-02 -14.779  < 2e-16 ***
## hoursoftheday19        -9.316e-01  7.645e-02 -12.185  < 2e-16 ***
## hoursoftheday20        -7.046e-01  7.488e-02  -9.410  < 2e-16 ***
## hoursoftheday21        -5.683e-01  7.388e-02  -7.693 1.50e-14 ***
## hoursoftheday22        -3.435e-02  7.137e-02  -0.481 0.630327    
## hoursoftheday23        -1.664e-02  7.132e-02  -0.233 0.815535    
## hafta10                -6.244e-01  2.131e-01  -2.930 0.003392 ** 
## hafta11                -3.161e-01  2.133e-01  -1.482 0.138285    
## hafta12                -1.739e-01  2.134e-01  -0.815 0.414993    
## hafta13                -5.574e-01  2.145e-01  -2.599 0.009349 ** 
## hafta14                -1.367e+00  3.111e-01  -4.395 1.11e-05 ***
## hafta15                -1.108e+00  3.109e-01  -3.565 0.000365 ***
## hafta16                -1.335e+00  3.116e-01  -4.286 1.82e-05 ***
## hafta17                -1.450e+00  3.118e-01  -4.650 3.34e-06 ***
## hafta18                -1.366e+00  3.403e-01  -4.014 6.00e-05 ***
## hafta19                -1.573e+00  3.546e-01  -4.436 9.22e-06 ***
## hafta2                 -1.306e-02  9.814e-02  -0.133 0.894125    
## hafta20                -1.451e+00  3.549e-01  -4.088 4.38e-05 ***
## hafta21                -1.696e+00  3.559e-01  -4.767 1.88e-06 ***
## hafta22                -1.854e+00  3.655e-01  -5.071 3.99e-07 ***
## hafta23                -1.891e+00  3.943e-01  -4.797 1.63e-06 ***
## hafta24                -1.997e+00  3.943e-01  -5.066 4.09e-07 ***
## hafta25                -2.130e+00  3.943e-01  -5.401 6.70e-08 ***
## hafta26                -2.041e+00  3.959e-01  -5.155 2.56e-07 ***
## hafta27                -2.012e+00  4.582e-01  -4.391 1.13e-05 ***
## hafta28                -1.760e+00  4.579e-01  -3.843 0.000122 ***
## hafta29                -1.982e+00  4.582e-01  -4.326 1.53e-05 ***
## hafta3                  1.372e-01  9.824e-02   1.397 0.162402    
## hafta30                -2.004e+00  4.585e-01  -4.372 1.24e-05 ***
## hafta31                -2.261e+00  4.764e-01  -4.745 2.10e-06 ***
## hafta32                -2.241e+00  4.928e-01  -4.547 5.46e-06 ***
## hafta33                -2.319e+00  4.928e-01  -4.706 2.54e-06 ***
## hafta34                -2.177e+00  4.923e-01  -4.421 9.87e-06 ***
## hafta35                -2.135e+00  4.949e-01  -4.313 1.61e-05 ***
## hafta36                -1.867e+00  5.233e-01  -3.568 0.000361 ***
## hafta37                -1.647e+00  5.229e-01  -3.150 0.001634 ** 
## hafta38                -1.613e+00  5.230e-01  -3.083 0.002050 ** 
## hafta39                -1.611e+00  5.229e-01  -3.081 0.002063 ** 
## hafta4                 -4.070e-01  9.903e-02  -4.109 3.98e-05 ***
## hafta40                -1.549e+00  3.383e-01  -4.578 4.72e-06 ***
## hafta41                -1.413e+00  3.376e-01  -4.186 2.84e-05 ***
## hafta42                -1.594e+00  3.372e-01  -4.727 2.29e-06 ***
## hafta43                -1.215e+00  3.374e-01  -3.601 0.000317 ***
## hafta44                -1.081e+00  3.025e-01  -3.572 0.000355 ***
## hafta45                -1.148e+00  2.940e-01  -3.906 9.43e-05 ***
## hafta46                -9.236e-01  2.932e-01  -3.150 0.001637 ** 
## hafta47                -8.388e-01  2.925e-01  -2.868 0.004141 ** 
## hafta48                -5.179e-01  2.638e-01  -1.963 0.049624 *  
## hafta49                -4.836e-01  2.301e-01  -2.102 0.035605 *  
## hafta5                 -2.655e-02  1.260e-01  -0.211 0.833073    
## hafta50                -5.061e-01  2.304e-01  -2.197 0.028048 *  
## hafta51                -9.263e-01  2.299e-01  -4.030 5.60e-05 ***
## hafta52                -5.683e-01  2.300e-01  -2.471 0.013497 *  
## hafta53                        NA         NA      NA       NA    
## hafta6                 -1.652e-01  1.672e-01  -0.988 0.323254    
## hafta7                 -2.374e-01  1.672e-01  -1.419 0.155783    
## hafta8                 -3.513e-01  1.664e-01  -2.111 0.034802 *  
## hafta9                 -6.465e-01  1.806e-01  -3.579 0.000345 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.489 on 20825 degrees of freedom
##   (194 observations deleted due to missingness)
## Multiple R-squared:  0.8384, Adjusted R-squared:  0.8377 
## F-statistic:  1149 on 94 and 20825 DF,  p-value: < 2.2e-16
#checkresiduals(lm13)

tmp=copy(datapn)
tmp=tmp[tmp$date.x >="2024-05-14",]

tmp[,actual:=production]

tmp[,predicted_trend:=predict(lm12,tmp)]

tmp[,residual_trend:=actual-predicted_trend]
tmp[,hour.y:=hour.x]
tmp[,date.y:=date.x]
tmp
## Key: <datetime>
##                 datetime     date.x hour.x dswrf_surface tcdc_low.cloud.layer
##                   <POSc>     <IDat>  <int>         <num>                <num>
##   1: 2024-05-14 00:00:00 2024-05-14      0        0.0000               27.748
##   2: 2024-05-14 01:00:00 2024-05-14      1        0.0000               23.612
##   3: 2024-05-14 02:00:00 2024-05-14      2        0.0000               22.176
##   4: 2024-05-14 03:00:00 2024-05-14      3        0.0000               25.244
##   5: 2024-05-14 04:00:00 2024-05-14      4        0.0000               66.244
##  ---                                                                         
## 378: 2024-05-29 17:00:00 2024-05-29     17      557.5424               12.980
## 379: 2024-05-29 18:00:00 2024-05-29     18      475.3048               12.872
## 380: 2024-05-29 19:00:00 2024-05-29     19      394.9286               11.532
## 381: 2024-05-29 20:00:00 2024-05-29     20      321.1014               10.816
## 382: 2024-05-29 21:00:00 2024-05-29     21      267.5859               11.804
##      tcdc_middle.cloud.layer tcdc_high.cloud.layer tcdc_entire.atmosphere
##                        <num>                 <num>                  <num>
##   1:                  19.448                 0.000                 33.540
##   2:                  16.940                 0.000                 29.796
##   3:                  15.980                 0.000                 28.896
##   4:                  15.400                 0.000                 31.368
##   5:                   6.440                 0.000                 67.796
##  ---                                                                     
## 378:                  39.776                 7.884                 48.748
## 379:                  43.312                 8.256                 51.632
## 380:                  43.768                 7.324                 51.172
## 381:                  43.488                 6.524                 50.376
## 382:                  42.588                 5.464                 48.780
##      uswrf_top_of_atmosphere csnow_surface dlwrf_surface swrf_surface
##                        <num>         <num>         <num>        <num>
##   1:                  0.0000          0.04       263.859      0.00000
##   2:                  0.0000          0.04       261.819      0.00000
##   3:                  0.0000          0.04       260.644      0.00000
##   4:                  0.0000          0.04       262.063      0.00000
##   5:                  0.0000          0.12       289.007      0.00000
##  ---                                                                 
## 378:                261.6896          0.00       337.264    101.88800
## 379:                242.1421          0.00       337.765     89.93280
## 380:                214.1965          0.00       336.652     76.80448
## 381:                179.0547          0.00       335.485     62.73920
## 382:                149.2122          0.00       333.756     52.28160
##      tmp_surface production lag15 lag48 lag72 lag96 lag95 lag47 lag71 lag49
##            <num>      <num> <num> <num> <num> <num> <num> <num> <num> <num>
##   1:    278.0520       0.00  8.88     0  0.00  0.00  0.00     0  0.00     0
##   2:    277.6708       0.00  7.85     0  0.00  0.00  0.00     0  0.00     0
##   3:    277.3680       0.00  8.36     0  0.00  0.00  0.00     0  0.00     0
##   4:    277.5830       0.00  5.34     0  0.00  0.00  0.05     0  0.03     0
##   5:    277.9880       0.07  4.04     0  0.03  0.05  0.63     0  0.64     0
##  ---                                                                       
## 378:    297.2530         NA    NA    NA    NA    NA    NA    NA    NA    NA
## 379:    294.2180         NA    NA    NA    NA    NA    NA    NA    NA    NA
## 380:    291.3340         NA    NA    NA    NA    NA    NA    NA    NA    NA
## 381:    288.4490         NA    NA    NA    NA    NA    NA    NA    NA    NA
## 382:    287.7440         NA    NA    NA    NA    NA    NA    NA    NA    NA
##      lag73 lag14 lag13 lag12 lag11 lag16 lag24 lag23 lag25  lag8  lag6  lag1
##      <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
##   1:     0  7.85  8.36  5.34  4.04  9.24  0.00  0.00     0  3.62  0.16     0
##   2:     0  8.36  5.34  4.04  5.89  8.88  0.00  0.00     0  1.30  0.00     0
##   3:     0  5.34  4.04  5.89  5.32  7.85  0.00  0.00     0  0.16  0.00     0
##   4:     0  4.04  5.89  5.32  3.62  8.36  0.00  0.06     0  0.00  0.00     0
##   5:     0  5.89  5.32  3.62  1.30  5.34  0.06  0.76     0  0.00  0.00     0
##  ---                                                                        
## 378:    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 379:    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 380:    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 381:    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
## 382:    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
##       lag2 hoursoftheday season   saat    gun  hafta     ay    tmax    tmin
##      <num>        <fctr> <fctr> <char> <char> <char> <char>   <num>   <num>
##   1:     0             0      2      0     14     20      5 296.812 277.368
##   2:     0             1      2      1     14     20      5 296.812 277.368
##   3:     0             2      2      2     14     20      5 296.812 277.368
##   4:     0             3      2      3     14     20      5 296.812 277.368
##   5:     0             4      2      4     14     20      5 296.812 277.368
##  ---                                                                       
## 378:    NA            17      2     17     29     22      5 304.153 282.081
## 379:    NA            18      2     18     29     22      5 304.153 282.081
## 380:    NA            19      2     19     29     22      5 304.153 282.081
## 381:    NA            20      2     20     29     22      5 304.153 282.081
## 382:    NA            21      2     21     29     22      5 304.153 282.081
##      trend lag1dswrf lag12dswrf actual predicted_trend residual_trend hour.y
##      <int>     <num>      <num>  <num>           <num>          <num>  <int>
##   1: 20733    0.0000  498.22800   0.00      0.10846081   -0.108460815      0
##   2: 20734    0.0000  536.99040   0.00      0.11997481   -0.119974809      1
##   3: 20735    0.0000  529.10560   0.00      0.12228199   -0.122281990      2
##   4: 20736    0.0000  505.62432   0.00      0.11917042   -0.119170419      3
##   5: 20737    0.0000  356.10880   0.07      0.07468105   -0.004681054      4
##  ---                                                                        
## 378: 21110  628.2720    0.00000     NA              NA             NA     17
## 379: 21111  557.5424    4.54800     NA              NA             NA     18
## 380: 21112  475.3048   38.40064     NA              NA             NA     19
## 381: 21113  394.9286   96.83520     NA              NA             NA     20
## 382: 21114  321.1014  169.20064     NA              NA             NA     21
##          date.y
##          <IDat>
##   1: 2024-05-14
##   2: 2024-05-14
##   3: 2024-05-14
##   4: 2024-05-14
##   5: 2024-05-14
##  ---           
## 378: 2024-05-29
## 379: 2024-05-29
## 380: 2024-05-29
## 381: 2024-05-29
## 382: 2024-05-29
# Assuming 'tmp2' contains the 'actual' and 'predicted_trend' columns
ggplot(tmp, aes(x=datetime)) + 
  geom_line(aes(y=actual, color="Actual")) +
  geom_line(aes(y=predicted_trend, color="Predicted")) +
  labs(title = "Actual vs Predicted Production",
       subtitle = paste("Forecast from", min(tmp$date.x), "to", max(tmp$date.x)),
       x = "Date",
       y = "Production") +
  theme_minimal() +
  scale_color_manual(values = c("Actual" = "blue", "Predicted" = "red"))
## Warning: Removed 118 rows containing missing values (`geom_line()`).
## Warning: Removed 117 rows containing missing values (`geom_line()`).

Conclusions and Future Work

Summary

Our analysis and modeling efforts have demonstrated the effectiveness of using a combination of weather variables and historical production data to forecast hourly solar power production at the Edikli GES solar power plant. By iteratively building and refining multiple linear regression models, we identified Model 13 as the most accurate and robust predictor. This model incorporates a diverse set of features, including surface temperature, total cloud cover, and various lagged production values, capturing both short-term and long-term dependencies in the data. The inclusion of categorical time features further enhanced the model’s ability to account for daily, weekly, and monthly patterns in solar power production.

Our approach highlighted the importance of feature engineering in improving model performance. The creation of lagged variables and the inclusion of detailed weather data were crucial in capturing the temporal and environmental factors influencing solar power production. Additionally, the iterative model-building process allowed us to systematically evaluate and incorporate the most significant predictors, leading to a highly accurate forecasting model.

Possible Improvements and Future Work

Lets check the WMAPE value of our model.

calculate_wmape <- function(actual, predicted) {
  sum_abs_errors <- sum(abs(actual - predicted), na.rm = TRUE)
  total_actual <- sum(actual, na.rm = TRUE)
  if (total_actual == 0) {
    return(NA)
  } else {
    wmape <- sum_abs_errors / total_actual
    return(wmape) 
  }
}
wmape_value <- calculate_wmape(tmp$actual, tmp$predicted_trend)
print(paste("The WMAPE value is:", wmape_value))
## [1] "The WMAPE value is: 0.199644696554339"

This is acceptable but not a great WMAPE value, there are several potential improvements that could further enhance its accuracy and robustness.

One of those improvements could be exploring non-linear models such as regression trees, random forests or so. These approaches could capture more complex relationships in the data that linear regression models may miss.

Although we use many weather variables, there still may be more detailed weather data such as wind speed, humidity, and etc. Those could provide a more comprehensive understanding of the factors affecting solar power production.

By pursuing these extensions, we can continue to refine our forecasting model and enhance its ability to accurately predict solar power production, ultimately contributing to more effective energy management and planning.

Code

Our main code is also in GitHub page.

We also attempted ARIMA models, which unfortunately did not succeed, but you can find on GitHub page.